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ABSTRACT 

The assumptions of the classical test-theory model 
are used to develop a theory of reliability for criterion-referenced 
measures-which parallels that for norm- re fere need measures. It is 
shown that the Spearman-Brown formula holds for criterion-referenced 
measures and that the criterion-referenced reliability coefficient 
can be used to correct criterion-referenced correlations for 
attenuation. A formula is developed which expresses the 
criterion-referenced reliability coefficient in terms of the mean, 
variance, and norm-referenced reliability coefficient. The 
implications of the resulting formula are discussed. (Author/DG) 
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The Reliability of Criterion-Referenced Measures 

\ ' Abstract 

The assumptions of the classical test-theory model^are used to 

» 

develop a theory of reliability for criterion-referenced measures which 
parallels that for norm-referenced measures. Tho criterion-referenced 
reliability coefficient is expressed In terms of the mean, variance, 
and norm-referenced reliability coefficient, and the Implications of 
the resulting formula are discussed. 
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The Reliability of Criterion-Referenced Measures 

"Criterion-referenced" is a term first used by Glaser (1963) to 
refer to measures that "depend on an absolute standard of quality." 

Thus criterion-referenced measures differ from "norm-referenced" meas- 
ures, which depend on a relative standard. Criterion-referenced (CR) 
measures compare the student's performance with a fixed standard, while 
norm-referenced (NR) measures compare his performance with the perfor- 
mance of a norm group. 

Popham and Husek (1969) have written that "the typical indices of 
internal consistency are not appropriate for criterion-referenced 
tests," and, at first glance, this poin’* would seem so obvious as to 
be Irrefutable. Since reliability theory is based on the existence of 
differences among the true scores of examinees, and CR measures are 
Intended to apply to situations in which ^here may be no such differ- 
ences, the two concepts would seem to be incompatible. Yet, with a few 
appropriate modifications, the classical theory of teat reliability can 
be applied to criterion-referenced measures in a way that closely par- 
allels its traditional application to norm- referenced measures. 

The basis for these modifications is a simple substitution. Con- 
sider the basic distinction between NR and CR measures. When we use NR 
measures, we are Interested in the extent to which an Individual score 
deviates from the mean score of a norm group. When we use CR measures, 
ve are interested in the extent to which an individual score deviates 
from a fixed standard, the criterion. To adapt traditional norm-refer- 
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enced reliability Indices to CR measures, one need only substitute the 
criterion score for the mean score of the norm group and redefine the 
various indices accordingly. 

Variance, Covariance, and Correlation 

j ■’ 

How can ve redefine the variance of scores on a CR test? The 

variance of a set of scores Is the mean squared deviation of the scores 
from the group mean. Since we are Interested not In the deviation of 
scores from the mean but In their deviation from the criterion, we can 
use. In place of the variance, the mean squared deviation of the scores 
from the criterion: 

(1) D * ■ vv • v 2 

2 

where D denotes the mean squared deviation of the X-measures from 
* < 

C^ , Is the obtained score of person p on form f , C x is the 

criterion, and indicates the expected value over persons. 

Since the concepts of covariance and correlation depend on differ- 
ences in scores, they, too, will have to be redefined. la place of co- 
variance, vi have a mean product of deviations: 

«> B «y ’ V\t * V<V * V 

The crltex ion-referenced correlation coefficient can then be defined as 
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D 

(3) p (X , Y) - — g- • 
x y 

P c Is a product-moment correlation based on moments about the 

arbitrary origins C and C , rather than about the means. The 

x y ► 

Pearson product-moment correlation, which will be referred to in this 

paper as the norm-referenced correlation p^ , Is thus a special case 

of p (with some special properties which do not generalize to other 
c » 

cases of p. )• * 

c ■* 

Definitions, Assumptions, and Basic Theorems, 

Since the criterion is chosen without reference to the distribu- 
te. 

tlon of scores, we can define the criterion of a sum of measurements in 

any way we choose. However, in ordet to construct Indices of rellabll- 

lty which parallel those for norm-referenced measurement, we will have 

to define tie criterion of a sum of measures as the sum of their cri- 

, * 

terlei 

t 

More generally. 



C (X + Y) * C x + C y ’ 



<4> C 



l X. 

1-1 1 



n 

i-1 x i 



It follows that 
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(5 > C (nX) * nC x • 



True scores and errors of measurement are defined exactly as for 
NR measures: 

T - E f (X -) and e * * X - - T . 

P f pf pf Pf P 

That is, the tree score of person p equals the expected value (over 
forms) of his obtained score; his error of measurement on a given form 
is the difference between his obtained score on that form and his true 
score. 

The concept of true-score variance must be replaced by the mean 
squared deviation of true scores from the criterion: 

<« D t • y T P - c ,> 2 • 

Classical test theory assumes that errors of measurement on sepa- 
rate measures do not covary over persona or over forms; the same assump- 
tions can be made for CR measures: 



VWi* ■ 0 * 






Classical test theory also assumes that errors of measurement do 
not covary with true scores on the same or on other measures. It fol- 
lows that errors of measurement do not covary with the deviation of 
true scores from the criterion: 




* 
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We can now prove a theorem analogous to the theorem for NR meas- 
ures which states that the variance (over persons) of obtained scores 
equals the variance of true scores plus the variance of errors of meas- 
urement. For CR measures, the theorem states that the mean squared 
deviation of obtained scores from the criterion equals the mean squared 
deviation of true scores from the criterion, plus the variance of er- 
rors of measurement. The proof of this latter theorem 13 as follows: 

(8) B 2 - E p (X p( - c/ - E p «T p ♦ e pf ) - c/ 



E I(T - C ) + e _r 
P P x pf J 



V T P - V 2 + E P <v > 2 + 2 E p"pt< T p - c *>] 



D 2 + c 2 + 2(0) 



_2 . 2 
■ — 5- -4 1 Cf . 

t e 



The Reliability Coefficient 



Lord and Novlck (1968, p. 61) define the reliability coefficient 
for norm-referenced measures as the squared correlation between true 



scores and obtained scores. We can follow their example and define the 
criterion-referenced reliability coefficient as the squared CR correla- 
tion between true scores and obtained scores: 



D - E (T - C )(X „ - C ) 
tx p p x pf x 



Vp ' c *>‘< T p + e P f) - c „! 



E (T - C )[(T_ - C ) + • .1 
p' p X * p x pf * 



E (T - C ) + E [e -(T - C )] 

p p x p l pf p x 



2 2 
D* + 0 - D* . 



Therefore, 



(9) 



Pe<T 



X) 



< D *) 2 

t x 



This result shows that the reliability coefficient of a criterion- 
referenced measure can be Interpreted as a ratio of mean squared devia- 
tions from the criterion, just as the reliability coefficient of a norm- 
referenced measure can be Interpreted as a ratio of variances. 

We can define parallel measurements just as for NR measures, with 
the additional requirement that parallel measurements have equal crite- 
ria. Then two criterion-referenced measures and ^ are parallel 

if and only if the following conditions hold: 
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for all p ; 
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a - o 



-t 



and C, ■ 



C 2 ’ 



We can then show that the correlation of two parallel measures X 
and X’ is equal to the reliability coefficient of X . The proof Is 
as follows (the notation has been simplified to avoid two levels of 
subscripts) : 



p (x , x’) - Dxx ' 



D D , * 
x x' 



Expanding the numerator. 



D , - E (X - C)(X’ - C) 
xx' p p p 



E [ (T + e ) - C] [ (T + e’) - C] 
P P P P P 



E [ (T - C) + e ] [ (T - C) + e’] 
P P P P P 



E (T - C V + E (ee ' ) + E [e (T - C)] + E[e'(T - C)] 
PP PPP PPP PPP 



V 2 + 0+ 0+ 0- dJ . 

V t 



From equation (8 ) , D' 



2 



2 2 
D t + * 



therefore, 



7 



P C (X , X’> - 



/(O^ + 0*)<D* + O*,) 



But, by the definition of parallel measurements stated earlier, 



Does the Spearman-Brown formula hold for criterion-referenced 
measures? It does, and its derivation for CR measures parallels that 
for NR measures. Suppose we want to know the criterion-referenced 

I 

reliability of a sum of a parallel measurements. By the definition 
of parallel measurements, all n criteria are equal; therefore, from 
equation (4), the criterion for the sum is nC^ . 

The mean squared deviation of the true scores is 




(10) p (X , X’) - 




The Spearman-Brown Formula 




■ “ 2 V t p - V 2 ■ " 2 ”? • 



The mean squared deviation of the obtained scores is 
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E (nT + Ee . - nC )‘ 
P P f Pf * 



E p' n(T P - V + 'V 1 

e p‘“ 2(t p - c x> 2 ! + E p< Ee pf> 2 + E p' 2n(I p - V|V 



n 2 E (T - C ) 2 + E (Ee 2 + E E e ,e .,) + 2nE [(T 



n 



p p x P r pf 



f*f' 



Pfpf’ 



- C )Ee .) 



P P x f Pf 



2J2 



The first term equals n . The second term equals 



n 9 nn n 9 nn 

E (Ee + E (E E e -e -,) “ + E E E (e »e -,) 

P f P‘ P fjff* P‘ P‘ f P P‘ fUe 1 P P‘ P‘ 



tit' 



Eo 2 + 0 - no 2 

f e f 



The third term equals 



2nEp[Ee pf (T p - C v )] - 2nEEje wf (T n - Cj] - 0 , by equation (7). 



V pf P 



Therefore , 

.2 



(12) D 



(EX) 



2 2 2 

n Df + no 
t e 



Then the CR reliability coefficient of the sum, by equation (9), equals 



(13) 



2 2 2 
P (ET) _ n P t 



nD, 



nD7 



°(EX) n 2 I> 2 + no 2 nD 2 + CJ 2 (n - 1)D 2 + (D 2 + o\) 



n 



nD: 



£ 

& 



np‘(T , X) 



(n - 1)D 2 + D 2 
t x 



(n - 1) 



'D 2 ' 



+ 1 



1 + (n - l)p‘(T , X) 



O 
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Correction for Attenuation 



The CR reliability coefficient can be uaed to correct CR correla- 
tions for attenuation. Again, the formula and its derivation parallel 

those for NR measures. First we must prove that „ - D : 

x y ^ 

(w v V x p' c x )(, p' V 



E (T + e - C ) (T + e - C) 
p x x x v y y y 



E [ (T - C ) + e ] [ (T - C ) + e J 
p x x x y y y 



E (T - C )(T - C ) + E [e (T - C )] + E [e (T - C )] + E (e e ) 

p x x ' y y p l x' y y" P y x x * p x y 



d tt + 0 + 0 + 0 

x y 



T T * 
x y 



By the definition of CR correlation, equation (3), 



p (T , T ) 
p c x ’ y 



Vv 

D T D T 
x y 



But D_ T - D , by equation (14). 
x y y 

$ 

And, since p^(T , X) ■ — , then 

C X j)* 

x 



O 
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D„ - D . p <T , X) . 
T x X c X ’ 



Similarly, D_ - V . p (T , Y) . 

y y c y 

Then 




X) • D y . P c (T* , Y) 



D 




D 

SL- 



p c (T x » X > • p c (T y * Y > 



P C (X , Y) 



Computing Criterion-Referenced Indices from Norm-Referenced Indices 

Suppose we have computed (or have a computer program for computing) 
the traditional norm-referenced Indices for a set of scores: the mean, 

variance, and estimated reliability coefficient. Can we use these norm- 
referenced indices to compute criterion-referenced Indices, Including 
the reliability coefficient, without having to refer back to each stu- 
dent's response to each item? The answer is yes; In fact, we can com- 
pute criterion-referenced Indices for this set of scores with any cri- 
terion we choose to specify. 

Let the mean, variance, and norm-referenced reliability coefficient 

2 2 

be represented by p , O , and p„(T , X) . Then the mean squared 

XX N 
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deviation of obtained scores from the criterion can be expressed as 
follows) 

(16) D 2 » E (X , - C ) 2 > E ((X * - V ) + (y - C )] 2 

X p pf x p* pf . X *X X ' 

- E (X , - VI ) 2 + E (y - C ) 2 + 2E (X . - y )(y - C ) 
p pf r x p r x x P pf x x x 



o 2 + (y ~ C ) 2 + 2(y - C )B (X . - y ) 

X X X ' r x x p pf r x 



O 2 + CU - C ) 2 . 

X X X 



A similar derivation holds for the mean squared deviation of true 
scores from the criterion. The result is 

(17) D 2 - 0 2 + (p t - C x ) 2 - p 2 (T , X)a 2 +• (u x - C x ) 2 . 



The mean product of deviations for two CR measures can be expressed 



• < 



t 



in terms of the means , criteria, and cfivarlance of the two measures) 



(18) D -E(X,-C)(Y.-C) 
xy P pf x pf y 



E [(X--ll) + (u - C )][<Y . - u ) + (u - C )] 
p lv pf H x' VH x x ,Jl ' pf H y ' H y y J 

E (X,-y)(Y--y ) + E (y - C )(y - C) 

p pf H x pf *y p *jc x ’y y' 



+ E (y -C)(Y.-y) + E(y - C ) (X _ - y ) 
p H x x ' pf *y p H y y pf x 



a + (y - c )(y - c ) + (y - C )E (Y - - y ) 
xy *jc x y y x x p pf *y 



+ (y - C )B (x . - y ) 
y y p pf x 



o + (y - C )(y - C.) . 
xy H x x *y y 
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Then the criterion-referenced correlation coefficient can be ex- 
pressed in terms of norm-referenced indices! 



/ 



(19) P c (X , Y) 



d p„(x , Y)a a + (y - c )(y - c ) 
xy u K N * x y x x y y 

D * Dy A? + (y - C ) 2 ](o 2 + (y - C ) 2 ] 
1 x ^x x Jk y ' H y y' 1 



Since we can expreBB the mean squared deviation of obtained scores 
and that of true scores in terms of norm-referenced indices, we can do 
the same for their ratio, which is the criterion-referenced reliability 
coefficient: 



(20) p 2 (X , X) 



j > X >°x + - C / 

D 2 a 2 + (y - C ) 2 

X X X X 



Implications of Criterion-Referenced Reliability 

Consider the implications of equation (20) . As the NR reliability 
coefficient increases, the CR reliability coefficient increases. When 
the NR reliability coefficient equalB 1.00, the CR reliability coeffi- 
cient also equals 1.00. In fact, the CR reliability coefficient is al- 
ways at .least as large as the NR reliability coefficient. The two re- 
liability coefficients will be equal whenever the mean score falls ex- 
actly at the criterion. 

The further from the criLefion the mean score falls, the greater 
the CR reliability coefficient. The reason for this relationship is 
that the mean of the obtained scores is equal to the mean of the true 
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scores— the point from which the sum of squared deviations of the Indi- 
vidual true scores Is the smallest It can be. The farther from this 
point the criterion lies, the more reliable Information one has about 
the deviation of all the individual true scores from the criterion. 

For this reason, NR reliability can be considered a special case of CR 
reliability— the caso In which the mean and the criterion are equal 
and the reliability of the test is minimized. 

Another way to think about the relationship between the mean, the 

2 

criterion, and the CR reliability of the test Is In terms of D^. and 

2 

O . From equations (8), (9), and (17), 

6 



( 21 ) 



P^(T 



X) - 



d * a: + (y - c ) 

t t X x 7 

2 2 2 2 

d‘ + o + (y - c ) + a 

t e t '*34 x e 



Increasing the distance between the mean and the criterion Increases 
the mean squared deviation of the true scores from the criterion, with- 
out any Increase in the error variance. As a result, the CR reliabil- 
ity Increases. 

How Is CR reliability affected by a decrease In the variance of 

obtained scores? The answer depends on the nature of the decrease In 

variance. If the NR reliability remains constant—that Is, If true- 

score variance and error variance decrease lu the same proportion— the 

CR reliability will Increase. The effect Is the same as that of in- 

2 2 2 

creasing (y - C ) while holding O. and 0 constant, 
xx t e 

However, a decrease In obtalned-score variance Is usually accom- 
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panled by a decrease In the NR reliability coefficient. What usually 

■< 

happens is that the true-score variance decreases while the error vari- 
ance remains constant. In this case, of course, CR reliability will 

\ 

decrease. 

V. * !► 

* 

What about the case of a mastery test, on which all the students 

. I 

are expected to get perfect scores? If they all get perfect scores, 

* 

does the test have no reliability? No, because the criterion Is a point 

selected to divide the scores above It from those below. Therefore, the 

' » 

criterion for a mastery test Is not a perfect score; it Is a perfect 
score minus some small fraction of an item. If all the students get 
perfect scores, the variances In formula (21) will equal zero. Since 

' o, 

there will still be the difference of a fraction of an item between the 

i 

c 

mean and the criterion, the CR reliability will equal 1.00. * 

There Is one theoretically possible case for which CR reliability 
Is undefined; that in which all the students obtain scores exactly at 
the criterion level. In this case both numerator and denominator In any 
of the formulas for the CR reliability coefficient would equal zero. 

But this case Is not a practical possibility; if the lowest passing 

score Is k Items, the criterion Is actually Jk minus some fraction of 
an item. However, it Is possible for a test to have CR reliability 
equal to zero. This will happen when the mean score falls exactly at 
the criterion and the NR reliability equals zero. . ' , 




i * 

( A. 
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